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ABSTRACT ' > 

Problems in the use of pupil achievement measures for 
evaluating teachers, schools or systems ar^ reviewed f with the 
conclusion that the? are ^disabling. The following reasons ar^ cited: 
(1) What the pupil brings, to the plassroom in tepms pf ability, 
previous knowledge,, home Md peer influence, motivation, and other 
influences is clearly very poVerful in determining academic standing, 
at the end of the year. (2) Student, achievement reflects only a small 
^portion of the total set of^ objectives for which schools increasingly 
are beiag. held accountable. (3) Taking pupil stapding at year-end as' 
ai^ indicator of teaching effectiveness fcegueritly does not recognize 
standing at the beginning of the' year; in addition, the.p£p*lems of / 
adjusting for prior stefnding are jextr^iely serious and raiely 
recognized. (4) This accountability system. would reward the teacher 
who teaches to the t^^sf or who gives primary attention to those 
pupils who are below^nlinimum standards but in reach of thern^; (5) Such 
an evaluation system jwQuld probably reward teaching behavior which 
promotes low cognitive leveL learning' an^d penalize teaching rwhich 
* promotes complex lea^rning. (Author/HV) ' • ^ 
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FOREt?ORD 



Incre§s\ngly state ass'Ojciation leaders anS staff, local teacher groups, 
and UniServ directors find themselves having to deal with the problems of 
mappropriate teacher evaluation. As most te'acHer leaders are aware i one 
aspect of such problems is reflected in attempts to^evaluat-e teachers on- 
the basis of student achievement. ' 

The attached paper provides a combination of responses to these prob- 

X 

lems-~res€arch findings, technical problems in using test scores, and other 
coasiderations. It has been prepared by two nationally eminent researchers 
in the field. ^ Citing the Soars can serve to increase the credibility of 
arguments against the use of student achievement for evaluating teactiers, 
.Their examples shotdld be particularly^useful in dialogues with school district 
pe^Qarch d'iredtorsT testing and evaluation coordinators, and other admln- 
/istra^ors who are committed to using a year's growth in a year as a measure 
of "Teacher competence,* • ■ * . . ^ . . . 

- In some other material' we have called to attention the major reasons 
why teachers must not be evaluated on the basis of student achievement. Thosre 
letnent ^nd a?:e supported by the Soar paper and seem worth repeati 



reasons comp 
here- in that context: 




ng 



he tests themselves are inadetjuate for such purposes. Banesh Hoffman 
has put* it well; n 

There is no generally satisfactory method of evaluating * ♦ 
humaa abilities an<f capabilities ... , Rough superficial 
■J ^evaluations are of course possible.... But the detection 
and evaluation of. other than superficial abitity is in- 
evitably an art \demanding insight, taste and knowledge. 
Current attempts tot^^reduce it to a science and then 
mechahize it aire jiot only dangerous but in a pi;ofound 
, sense unscientific.^ ^ 



iHoffman, Banesh. ''Psychometric Scientism.'' Phi Delta Kappan 48: 381; 
April 1971. • , * 



^. The nature of student populations ^s so varied that outcotties 'are 

often more influenced by those variables than by whet teachers 

^do. Gene Glass, ^ tcd rG 0 ca ifcher7''''fem^^ us: ' 

Nothing short of random assignment of pupils to teachers* * 
as an iron-clad administrative- necessity will ensure that ^ 
the- teachers were. in a fair race to produce pupil gains. 2. 

3. Many of the conditions which measurably affect learning outc;omes 

are conditions over vhich teachers have little or no control and 

they vary wid,ely among -schools.' Among them are: the number of 

students teacher? must worl^ with each day; time available to 

teach; planning time; up-td.-dateness of curriculum; appropriateriesi 

of materials and media; students' physical 'and emotional readiness 

' for ieairning; opportunity for teacher in^serviqe education; and , 

' most important , decision-making power on curriculum matters. 

> Each 'of the re'asons cited is consid^ed in ort? form or"^ another in the 

' - / k 
.Soar paper. And even though some -of the technical explanations may- go 

beyond the .needs of teacher leaders in responding to the issues, they serve* 

as backups to coimnohly. held teacher association- positions. 



. — Bernard H. McKenna 
Professional Associate 



NEA Instruction and Professional Development 




^Glass, Gene V. "Statistical and Measurement Problems in Implementing 
the Stall Act/' Mandated Evaluation of Educators: A 'Conference on California's 
• Stull Act , ,^.\Edited by N. L. Gage.) .Washington; D. C-: ^ Education Resources » 
Division, Capital Publications, 1973. p. 54. * ' s 
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PROBLEMS IN USING PUPIL OUTCOMES 
.FOR TEACHER EVALUATION 



DuriN^ the past few years there has been mounting pressure for 
measuring the outcomes of edu'cation, with movement toward holding the 
te'acher, the* school, and the school system "accountable" for pflroducing 
.the' student learning expecte^ by society. Decreasing enrollments, tighter 
budgets, and a genjeral trend toward cost effectiveness have added to the 
pressure/ * * . ' ' ' 

Measuring pupil achievement has increasingly been proposed as a way 
of assessing the e ffectiveness . of .teaching , and in fact has^Ji^^^n'^ndated 
by a number of states. This approach is superficially reasonable and « 
attractive, but it is f taught with problems which have not been generally 
.recognized. » . . . ^ ^^^^^^^^ 

U. L. Mencken once commented, "There ^s always a well-known solution 
to every human problem--neat , plausible and wrong." The use of pupil ^ ' ' 
achievement as a way of evaluating the teacher, y{he school., or the school 
System embodies this 'misleading simplicity. The solution seems so straight- 
forward: If the jo|> of the teacher^i-s -'^o promote learning in pupils, then 
it seems reasonable to evaluate the t^aacher. ii\ terms of the amount of learning 
he produces in his pupils'. 

The parallel with the 'industrial setting is clear: If the job of the 



worker is to assemble relays, then it seems reasonable £o count the number 



of relays th^ worker assembles and pay IrLm or her accordingly. But in 
applying this procedure to teachings a number of problems" esjnerge which » 
have not been widely recognized,. The relay assemb'^er receives parts which 
are identical (dt^.least within 'verj close limits) on which he or she per- 
forms a prescribed set of operations, also identical. Then the completed 
units' leave the assembler, again almost identical from one to another. 

But none of this is true for the teachers. Pupils appear in the 
classroom differing in ability, level o^ achievement, 'home background, 
interest, motivation, age — differing in numerous ways. The t^eacher mu^t 
recognize these differences as he or she strives to hglp individual pupils 
grow toward their own potential.' Consequently, the teaching process will 
differ frqm pupil to pupil. If the Jteacher has been successful, each pupil 
will have improved educationally when he or she leaves the classroom but 
each will probably be no more like the* others than when the year began. , 

A major dimension, then, of the prob^lem of evaluating teacher^ in 
terms of pupil outcomes is the recognition that what goes on in the class- 
room is not the only, or the most powerful, influence on wher^ a pupil 
stands in achievement at the end of the^year. 

\ - IiKlluences Other Than 'the Classroom \/ 

Research has shown that the differences pupils bring with them when they 
Alter -the classrodm have significant influence on achievement, 

Entify level ability (pretest or fall score) and .socioeconomic status 
are major determinelrs of what a pupil's standing will b^ at the end of the 
school year. These influences probably are more widely accepted than any 
other, but they are highly interrelated, so that one overlaps the other. In 
practice they •cannot be effectively separated^ _ 



The fact that IQ and achievement scoses in the fall^are highlit ^ 

related to spring achievement scores is widely accepted but;, seldom docu- 

merited. In a study of 81 fifth-grade cla|s,es, Soar and .Soar (1973) foun^i 

t 

correlations between class averages (means) for fall IQ and spring achieve- ^ 

meat ranging from +.85 to +.90, and corr.elations between fall achievement 

and spring achievement^ ranging from .75 to .85. So the evidence is that 

as much as 80 percent of the variation in class averages for pupil 

achievement at the end of the year can be accounted for .by pup^ll char- 

acteristics wh-ich existed at the beginning of the year, characteristics 

over which the teacher has no control. 

the most extensive data on the influence of socioeconomic status 

on pupil achievement were presented in the Coleman Report, and more 

recently and more widely reanalyzed by Hosteller and Moynihan (1972) and 

Mayeske,' et al. (1972). The studies show that as mucTh as 8*0 percent of tl^ 

variation in pupil achievement across schools (equal 'to a correlation of 

about +.90) can be accojunte'y for by these factors. ' 
* » • « 

Beyond these major influences there* are others which help account for 
differences in pupil achievement and which should jbe considered. Although 
the research on family attitudes and support for learning in the home is not 
as extensive aa that for pupil ability (pr#test) and social status^ it is 
consistent in indicating relationships between the educational values held 
by parents and their children's achiev^^^jent in school. Garber and Ware (1972) 
found a relation .of +.47 between achievement, and a combined measure of support ' 
for learning in the home for a group of Black and 'Spanish -Araeri can children. 
All students in the sample met federal poverty guidelines, so that socioeconomic 
status 3s usually measured was, in effect, held constant. The same authoi^s 
cite similar .find.ings from other .studies . 



Peer 



g^oup attitude, although again the research-is not ^xt^^sivfe, has 



bden. identified" as another impor.tant factor which can aither supp^|: ,or hinder, 
a pupil *s achievement (Anderson, 1970). 

, Since* there is compelling evidence thajt a number of influences oxi^er which 
the teacher has no control have powerful effects on pupil ^cjiieveme^nt, it 
cannot expected that a teacher will have consistent results with successive 
groups of pupils. That is, the teacher will not be equalljii_fe.f£eetlv.e in 
producing^ growths with all groups because 'groups differ so widely. Studies 
by Rosenshine (1970) and. Brophy (1972*), for example, show that on the 
average only about 10 to 15 percent of the >var iation in achievement from 
group to , group reflects the stable influence of the te«icher, as^ shown by 
a median correlation in the low .30*s. , ^ 

As Medley (1974) has pointed out, and as commonly* accepted methods 
of estimating reliability show (ChrcLbach, 1960, 'p. 131), data from about 
twenty class^ would be required^for making reliable decisions about in-, 
dividual teachers. Given this requirement necessitating colflection of such 
large amounts of data, using .the measurement of pupil achie vem0nt':*as a way 
to evaluate teachers is impractical as well as invalid. 

What these findings seem to indicate is that the education of the 
pupil is dependent on jnany conditions in the society, not on the school 
alone. When the time the pupil spends in the classroom is compared with 
the time he spends under other influences, and when the degree of influence 
or control the teacher can exercise^ is compared with the power of other in- 
^^luences, the ^limited effect of. the teacher is not surprising. 

Because influences other than the teacher make a major 'difference In 
how much the child learns is not to say that., the role of the teacher is 
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unimportant. The teacher is the only formal, institutionalized input "the 

i . ' : \y 

society has to >the . education of the child and the transmission of an established' 

curriculum. And much of what the-teacher does that contributes constructively 

to the child's future abilities, successes, and satisfactions may pot be measured 

by currently common achievement instruments. It does say, however ,• thet the. 

teacher *s influence is limited and that the teacher is most effective when he 

or she has the support of other elements In the society. ^ 

This whole constellation of other influences is usuaTIy not ^iven 

consideration when measures of pupil achievement are proposed asthe basis ^ 

for evaluating teachers. It is reasonable that these influences are ^rong, 

since they accumulate^ over the life of the pupil^ It is obvious, the.n,» that 

pupil stianding at the end 6f any school year is a completely inadequate and 

• i ■ ■ • » 

even misSleading measure of the effectiveness of the teacher or the school. 

Yet the. results of sych achievement standings are frequently published by 

school or by school system. • 

. ^ ^ - \ * . 

' 'Standing" versus "Change*' as Measures of Outcopie 
"Acliievement which is the tnos frequent ly" used measure of student ^ 3 



^learning 



outcomes, usually refers to the -amount of knowledge a pupil' possesses 
2n point—his or her "standing/* The "influences cited above show a 



at a giv 

strong relation to achievement as used in this sense. 

An alternative to measuring achievement staj:iding is to measure "change*' 

in achi^ivement from' the beginl^ing to the end of the year. When this! is done, 

' * ■ " 1 

the-inf LUences cited are still likely to have an effect, although to W lesser 

* \ 

degree, since change reflects their influence for a shorter^ period ofltime, 

Altiiaugh this alternative ^ is appealing' as- another way of ev,aluat;ing 
teaching, i't rafses 'still other problems. In a classic volume on the problems. 



f measuring clfange, Rereiter (1963) commented: ^ 



Althpu^h it is commonplace for researcfi to be stymied 
by some difficulty in experimental methodology, there 
are really not many instances in tfie behavioral 
sciences of promising^ q^jestions going unresearched 
bfica'use of deficiencies in s^fcatistical methodology. 
Questions de^aling with pfsycho logical change' may well 
constitute the most important exceptions. It is only 
in relation to su^h questions that the writer, has 
ever heard colleagues a^mit to having abandoned -^majbr 
research objectives sole4y because the statistical 
problem seemed to be insurmountable/ (p. 3) 



Diffjqulties in Measuring Change . , 

)l1 the fall^ score is simply substracted from the spring score so as 

^ 1 ^ — 

to obt^o-n a measure of net change, a new set of subtle but difficult 

problertia is created. An illus^:rat ion may serve to identify some of them. 

\ \ 

Figure. 1' presents fictitious d-ata from' a group of pupils for whom measures 

of IQ from two forms of a test' have been bbtaineS 10 days apart! The initiaf 

IQ*s are iplotted on the baseline and* the second IQ's on the vertical axis. 

Afty point in the area outlined by the elipse represents simultaneously the 

IQ of a pupil on each of the testings, and the high and low 10 percent 6f 
*> , - < 

the pupils at each the two times has been indicated by sha^ding and cross- 
^ ' ' \ . 

hatching.* 

It is clear that the pupil-s who were in an extreme group on th^ first 



7 ' \ A 

^tfe'^t ^were^^fibt'^'^foi' thp.me^st part, in an^ extreme- group on the second tes-t. 

The blackened areas represent the small, number .of pupils who were, extreme 

on both occasions. , 

1 . ■ 

At the uppler r*ight, the area is smaj^ because the pupils who make 
the higKest scores at any 'testing are likely to do so on two bases: (1) they 
are bright '(ha\ie high verbal skills), and (2) they are ''lucky" (that is, the 



happen to m^'^ke %ood guessds oh a few items for which they aren't' ^ure of the 



answer, or the items» on this test just happen to be ones fbt which they ^know the 
answers^. But they are not likely to be lucky' cons isteii^^l^ when -another f6rm 
of the test is giVftn, and so on another testing Lherr scores cire likely to be ♦ * 
lower. 'Opposite influences will nffect pupils nt the lower left end o*f the* 
elipse , - / * . 

Tq,' put it another way, i'f*"the cuttifig point for the, top ,10 pergent is an ♦ 
IQ oril20, there will be a number of pupils with tru^ IQ's close' to 120 who* 
will someti'^ies bo alcove tlM r score on a series of rests and somej imes . beloV 
it,, dopfncHnt' an ^.li.inc faciors. So some fraction of pupils above 120vOn ^ 
the first test will fall be4ow it on the second. Similarly, some fracti^on . 
of, the ^pupils -scori ng below 80 on a first te^t will be "abdpv^ it on* a- se*cond . 

In botli cases', extreme pupils have "regressed,'" or moved, toward the 

• /('ft \ \ 

mean. This regression effect can be expected whenever prediction is lessx 

.than por^fect, and the extent of the movement wQ-l defend on the inaccuracy 

of the prediction (Lord, 1963). With mos,t psychological or ed.ucat iortfi-1 

predictions, the Regression' invotvecL is considerable and tnay make *up a sig- 

* 

nificant proportion of the total range of scores. 

• . • ■• ' * ■ 

'The point to be stressed from this example has important consequences: 
Since- pupils who^ere in the b(5ttom LO percent the first time were not, for 
the jjpost part, in that' group the second time,' they must have moved: upward . 
Similarly, the pupils ^i^n the top^ group must have mq^ed dowaward. That is, 
there is. ^ hegatlve relationship^ between initial standing and the direction in 
which change is most likely. ' ' . , ' 

As an example of' this effect, the pupils^ who stand highest on an 
achievement ;neasure at'the beginning of the school year will probably show. # 
little if any dncreaae in^^score'-al the e^d of the year, and may even show a 



decline. On the other hand, pupils who score lowest at, tjie beginning^of 
the year will , probably show considerable increase/ Educators have sometimes 
been misled by ^ms^ ef f e.ct and have assumed that their programs were mbre 
functional fbr^J&w achieving pupils than for high achieving pupils, when m 
reality all that was involved was the regression, effect (the statistical 
tendency for ^cOTes to move toward the average). Similarly, a group of pupils 
placed in a remedial prcJgram because they stand low on a pretest can ^e ex- 

-* " •* ' :■' 

pected to show considerable improvement; but- again the improvement may .be 

, i — ' . * 

spurious, as a consequence the regression effect. 

This problem creates real difficulties if pupils are tracked on the 

basis 6f fall scores and teachers are evaluated on the basis of change in 



achievement of their pupils. For example, a$sume that pupils are tested in 

reading in the f alL and the lowest third are put in Miss Jones' class, tH^ 

middle third in Miss»Smith's class, and the highest third^ in Mrs, Williams' 

class. We can anticipate that a*t the end of the year Miss Jones' class will 

show*much improvement ^and MisS- Smith's wilW^ow lAodest gain, but Mxs. Williams. 

.will be fortunate if her pupils show- any growth at all. -The problem is that 
^ * - ^ .f ' 

• ; ' ' : . ■ -. . / 

the gain the pupils show is mate^rially affecte'd' by regression effect, so to 

evaluafe the teacher on the basis of pupil %ain^\^ould be manifestly unfair. 

\ ' * ' . ' . ^ ' ' ' ' /'''.' 

There are, statistical-procedures fd^w^^t temp ting to eliminat^ this effect, 

but as Bereiter (1963) commented, it is impossible to be celrtai/n that appro- 
'iiC ^ . • " , ' ' 

' priate , ad j ustments have beei) made; and the experrtzise to <^o '^ven the best that 
• * - ^ " . / 

can be done with , the problem is not widespread/ And, of cotjr^e, all the bat- 

r ' • . • / 

of-school Influences on achievement standijig discussed earlier also influence 

' ' \ - ■ / 

galp, v^lthough' to a lesser ^egree. So it* is clearly inappropriate^ !:a use 
pupil chang^ as a way oi evyalu^ting teachers wj;iere a teacher may suffer as a 
consequence of. the error involved. , " . ' - 



Teacher PerfonoJnce Tests > A procedure for evaluating teachers which. 

attempts to bypass the prot^lems of change is the performance test or the 

evaluative teaching unit (Flanders, 1974X. In it, the teacher teaches a \^^^^^ 

prescribed brief unit (sometimes as little as a few minutes or as .much as 

« 

two weeks) and pupil knowledge is then tested. The attempt is made to 
minimize the problems of measuring gain by teaching material in which .pupils , 
should have, little or no preknowledge , so that all presumably start at ^ . 
"ground zero," But the other problems' of using pupil achievement to evaluate 
teachers still apply. In addition^ there are questions of whether teaching^ 
material which does not have to be integrated into previous knowledge re- 
quires the same skills as tjie usual teaching setting dnd whether such short- 

tem learning generalizes tq long-term learning. Thete is the final dii^- -» 

' /' 

ficulty that the performance* of teachers on a unit of a few minutes does 

W . / ^ 

not predict their p.erformance on a twcS-week unit (McDonald, 1974), Assupiing 

/ ' • ^ 

that either can be 'used to. predict year-long performance then seems risky. 

Even if the measurement of standing or gain in achievement wara a 
s. / * 

/ / • ' • 

satisfactory way of evaluatijig teachers, there is still the^ problerp of 

• selecting the objeotives to be measured/ 



What Objectives Sl^ould Be Measured? 



Although subject" matter achievement has been the primary focus of> the 

"I - , 

discussion thu^ far^ it is clear tha£ schools are 'charged with and have 

* •/ y. ' . . " " 

accepted soma degree of responsibility for many other kinds of pupil growth. 
. The N^d for Multiple Mea^uyes , Ov^r a long period schools have giyen 



attention/to the social development and "the mbra^^values* of pupils. And a 
broad v/ew of the relationship between school and society suggests that when, 
a'probiem emerges- in the so/iiety, one of the first stet^s is likely to be to 



11 . ' 



involve the school in solving the' problem. Traffic problems led to driver 
education; a concern for the loyalty .of government employees led first to " 
a_ban on teaching about communisfn in the schools and later t<5 the requirement 
* that- it be taught; pfbblems^ of drug afiuse have led 'to drug abuse education 
' in the schools; concern about sexual attitudes has led to sex education; 

concern for occupational choice has led to career education in the ^schools; 
^ari'cf'^when concern fox^^gregation of the races- became pressing for the society, • 
the, first and the major attempt to deal with the problem was delegated to 
the schools. ,/To evaluate teachers and schools solely on the basis of the 
subject matter gainf made by pupils grossly under-represents the broad 
rapge of objectives for which teacners and* schools have been given some 
degree of responsibility. Yet for many of these objectives there are no 
measures which are immediately, for seme even remotely, available. ^ 
Simple Versus Complex Learning . Even within the subject matter i^ealin 
there a 56 problems^ which are largely ignored. One of these problems^ is the ^ 
need to^ distinguish complex achievement growth from simpl^e growth and to 
provide appropriate measurement for each. Memory of facts (rote memory) 
falls at the simplest level and complex problem solving, abstracting, ^nd 
. generalizing fall at the complex . leveU The distinction is between retrieving 
information (memory) and processing information in its varying degrees of 
complexity, There^ is some evidence from a number of studies that the te-aching 
behaviors which are .associated with greatest growth in simple tasks are different 
fragt t^se which, are associated with greatest growth in complex tasks (Solomon, 
BezdeK;^ and Rosenberg, 1963; Soar, 1968; Soar and Soar,- 1972, 1973). * 

Most studies of pupil achievement fail to make this distinction; and 
fhe 'current stress on criterion-referenced measurement, emphasizing "small- 



15 



12 ■ 



Step*!,, learning, seems lil^ely to,focu^on s'imple kinds, of learning,^ Measures 
of complex learning are ^slow and difficult to construct, in contrast to . 
"measures^ of simple learning, which can be more easily and quickly developed, * 
Evaluating all subject matter at all grade levels would almost certainly ^ , 
^require -the construction of many hew measures which would likely emphasize 

A 

simple kinds of achievenvent ^ given the ease with which they can be constructed 
and the emphasis on criterion-referenced measurement. If teachers were to 
be evaluated on the basis of pupil achievement, then, it ^seems likely that 

» 4 

the teacher who emphasizes simple learning would be more positively evaluated 
than the teacher who emphasizes more complex learning. This wo\ild be an ^ 
unfortunate result." ' , , . - 

A further problem related to the difficulty of measuring complex 
achievement growth is the likelihood that .some highly\valued objectives 

. ' • . ■ I ; / 

grow toa slowly to show change within a school year --1 objectives' such as 
coAplex problem -solving skills, citizenship, attitudes, learning to get b 
along well w^th others,' and creative -expression.^ On the otheV hand, it seemls 
likely that measures of short-term learning would tend' to emphasize simpiler 
kinds of learning, * " * ^ 

' Othei; Problems in the Use of Piipil Outcomes 

. , ^ — 7 = -s 

A description of an application of accountability in England a century 
ago makes one of the problems clear tS^n^Hj 1972). In that setting, teacher^ 
werd evaluated on the number of their pupils who attained the minimum level 
of achievement expected for the particular^ grade . The result was that teacher 
concentrated their efforts at the minimum level of pr'of iciency , with a con- 



sequent lowering of the quality of instruction. 



Another problem df serious* consequence in the use of- pupil measures 



IS 



/... . / -13 

Is raised Ijy the OEO study— of performance contracting,, which found that 

the superior achievement of performance contracting programs disappeared 

when the teaching was controiTed to eliminate the possibility of , teaching 

the test (Page, 1972, 1973). ^Th6 implication seems clear that, in a ^$ 

setting in which financial return follows from '|>u5il achievement, teaching 

h , * 

the test is likely to occur at least a portion of the time. "JChis is a 

very reasonable finding and one .which- is well-known, even in^ cas'es where a 

financial return is not involved teaching to the Regents Examination, 

for example. ' . 0 

* A final' problem is the possibility of bias if ^he teacher is the test 

administrator. Even outside test administrators have difficulty not "helping" 
I 

pm^il^; but where a teacher is affected personally, it seems possible that 
his or her behavior might be influenced, even though unconsciously, "fhis 
prQblem could be dealt with by using 'only specially trained test administrators, 
but this could be very costly, t. * , ^ 

• Summary > 
When all these problems ia the use of pupil achievement for teacher 

evaliiajtidn are considered, they become overwhelming. The influence of the 

K 

teacher is minor compared to out-of -the-classroom influences pupil ability, 
ft 

previous knowledge , the home, the peer grpup, motivation, and others, W^at 

tjie pupil brings to the classroom in this respect is clearly a much stronger ^^^"^ 

determinant of where he o/ she will stand at the end of the year than, anything that 

has been done in the classroom.. Influences on the deveJLopment of jcuture ' • 

achievement measures seem likely to limit them to relatively simple measures 

for some time to come. Tests available for measuring the other objectives for 

which the teacjier is to some degree 'responsible are relatively few. In addition 
• ^ < 

to' these problems, there are statistical difficulties in the measurement of. 

• f • 
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change which are extremely serious, if not disabling. They are still further 
exace^rbated by the likely problems of teaching the test, o* the teactier 



giving attention' primarily to a small portion of the students, and of ob- 



taining valid measurement ip'the classr5om. 



/ \ Taken all in all,, this is an imp<^§ing array of difficulties, most of 
, which have gone unrecognized when it is proposed' that teachers be evaluated 
by ineasuring^ the outcdmes of their pupils. 
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